Recurrent Networks: Second Order Properties and Pruning

نویسندگان

  • Morten With Pedersen
  • Lars Kai Hansen
چکیده

Second order properties of cost functions for recurrent networks are investigated. We analyze a layered fully recurrent architecture, the virtue of this architecture is that it features the conventional feedforward architecture as a special case. A detailed description of recursive computation of the full Hessian of the network cost function is provided. We discuss the possibility of invoking simplifying approximations of the Hessian and show how weight decays iron the cost function and thereby greatly assist training. We present tentative pruning results, using Hassibi et al.'s Optimal Brain Surgeon, demonstrating that recurrent networks can construct an efficient internal memory. 1 LEARNING IN RECURRENT NETWORKS Time series processing is an important application area for neural networks and numerous architectures have been suggested, see e.g. (Weigend and Gershenfeld, 94). The most general structure is a fully recurrent network and it may be adapted using Real Time Recurrent Learning (RTRL) suggested by (Williams and Zipser, 89). By invoking a recurrent network, the length of the network memory can be adapted to the given time series, while it is fixed for the conventional lag-space net (Weigend et al., 90). In forecasting, however, feedforward architectures remain the most popular structures; only few applications are reported based on the Williams&Zipser approach. The main difficulties experienced using RTRL are slow convergence and 674 Morten With Pedersen, Lars Kai Hansen lack of generalization. Analogous problems in feedforward nets are solved using second order methods for training and pruning (LeCun et al., 90; Hassibi et al., 92; Svarer et al., 93). Also, regularization by weight decay significantly improves training and generalization. In this work we initiate the investigation of second order properties for RTRL; a detailed calculation scheme for the cost function Hessian is presented, the importance of weight decay is demonstrated, and preliminary pruning results using Hassibi et al.'s Optimal Brain Surgeon (OBS) are presented. We find that the recurrent network discards the available lag space and constructs its own efficient internal memory. 1.1 REAL TIME RECURRENT LEARNING The fully connected feedback nets studied by Williams&Zipser operate like a state machine, computing the outputs from the internal units according to a state vector z(t) containing previous external inputs and internal unit outputs. Let x(t) denote a vector containing the external inputs to the net at time t, and let y(t) denote a vector containing the outputs of the units in the net. We now arrange the indices on x and y so that the elements of z(t) can be defined as , k E I , k E U where I denotes the set of indices for which Zk is an input, and U denotes the set of indices for which Zk is the output of a unit in the net. Thresholds are implemented using an input permanently clamped to unity. The k'th unit in the net is now updated according to where Wkj denotes the weight to unit k from input/unit j and "'0 is the activation function of the k'th unit. When used for time series prediction, the input vector (excluding threshold) is usually defined as x(t) = [x(t), . .. , x(t L + 1)] where L denotes the dimension of the lag space. One of the units in the net is designated to be the output unit Yo, and its activating function 10 is often chosen to be linear in order to allow for arbitrary dynamical range. The prediction of x(t + 1) is x(t + 1) = lo[so(t»). Also, if the first prediction is at t = 1, the first example is presented at t = 0 and we 'set y(O) = O. We analyse here a modification of the standard Williams&Zipser construction that is appropriate for forecasting purposes. The studied architecture is layered. Firstly, we remove the external inputs from the linear output unit in order to prevent the network from getting trapped in a linear mode. The output then reads x(t + 1) = Yo(t + 1) = L WojYj(t) + Wthres,o (1) jeU Since y(O) = 0 we obtain a first prediction yielding x(l) = Wthres,o which is likely to be a poor prediction, and thereby introducing a significant error that is fed back into the network and used in future predictions. Secondly, when pruning Recurrent Networks: Second Order Properties and Pruning 675 a fully recurrent feedback net we would like the net to be able to reduce to a simple two-layer feedforward net if necessary. Note that this is not possible with the conventional Williams&Zipser update rule, since it doesn't include a layered feedforward net as a special case. In a layered feedforward net the output unit is disconnected from the external inputs; in this case, cf. (1) we see that x(t + 1) is based on the internal 'hidden' unit outputs Yk(t) which are calculated on the basis of z(t 1) and thereby x(t -1). Hence, besides the startup problems, we also get a two-step ahead predictor using the standard architecture. In order to avoid the problems with the conventional Williams&Zipser update scheme we use a layered updating scheme inspired by traditional feedforward nets, in which we distinguish between hidden layer units and the output unit. At time t, the hidden units work from the input vector zh(t) , k E I , kE U , k=O where I denotes the input indices, U denotes the hidden layer units and 0 the output unit. Further, we use superscripts hand 0 to distinguish between hidden unit and output units. The activation of the hidden units is calculated according to y~(t) = fr[s~(t)] = fr [ L Wki zJ (t)] , k E U ie1uUuO (2) The hidden unit outputs are forwarded to the output unit, which then sees the input vector zkCt) OCt) _ { y~(t) Zk yO(t-1) and is updated according to , k E U k=O

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

System Identification with General Dynamic Neural Networks and Network Pruning

This paper presents an exact pruning algorithm with adaptive pruning interval for general dynamic neural networks (GDNN). GDNNs are artificial neural networks with internal dynamics. All layers have feedback connections with time delays to the same and to all other layers. The structure of the plant is unknown, so the identification process is started with a larger network architecture than nec...

متن کامل

The Effects of Pruning and Potassium Nutrition on Some Morphological Traits and Seedling Properties of Pumpkin (Cucurbita pepo L.)

In order to investigate the effect of pruning and potassium nutrition on pumpkin grain yield and quality, a factorial experiment based on complete block design with four replications was carried out in nooshar village of ardabil province, Ardabil, Iran, in 2007. Experimental factors include potassium nutrition in three level (0, 75 and 150 kg/ha from potassium sulfate) and stem pruning (without...

متن کامل

The Effects of Pruning and Potassium Nutrition on Some Morphological Traits and Seedling Properties of Pumpkin (Cucurbita pepo L.)

In order to investigate the effect of pruning and potassium nutrition on pumpkin grain yield and quality, a factorial experiment based on complete block design with four replications was carried out in nooshar village of ardabil province, Ardabil, Iran, in 2007. Experimental factors include potassium nutrition in three level (0, 75 and 150 kg/ha from potassium sulfate) and stem pruning (without...

متن کامل

Pruning recurrent neural networks for improved generalization performance

Determining the architecture of a neural network is an important issue for any learning task. For recurrent neural networks no general methods exist that permit the estimation of the number of layers of hidden neurons, the size of layers or the number of weights. We present a simple pruning heuristic that significantly improves the generalization performance of trained recurrent networks. We il...

متن کامل

The Effect of Fruit Trees Pruning Waste Biochar on some Soil Biological Properties under Rhizobox Conditions

The pyrolysis of fruit trees Pruning waste to be converted to biochar with microbial inoculation is a strategy improving the biological properties in calcareous soils. In order to investigate the biochar effect on some soil biological properties of the soil in the presence of microorganisms, a factorial experiment was carried out in a completely randomized design in the rhizobox under greenhous...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994